284 research outputs found
K-nearest Neighbor Search by Random Projection Forests
K-nearest neighbor (kNN) search has wide applications in many areas,
including data mining, machine learning, statistics and many applied domains.
Inspired by the success of ensemble methods and the flexibility of tree-based
methodology, we propose random projection forests (rpForests), for kNN search.
rpForests finds kNNs by aggregating results from an ensemble of random
projection trees with each constructed recursively through a series of
carefully chosen random projections. rpForests achieves a remarkable accuracy
in terms of fast decay in the missing rate of kNNs and that of discrepancy in
the kNN distances. rpForests has a very low computational complexity. The
ensemble nature of rpForests makes it easily run in parallel on multicore or
clustered computers; the running time is expected to be nearly inversely
proportional to the number of cores or machines. We give theoretical insights
by showing the exponential decay of the probability that neighboring points
would be separated by ensemble random projection trees when the ensemble size
increases. Our theory can be used to refine the choice of random projections in
the growth of trees, and experiments show that the effect is remarkable.Comment: 15 pages, 4 figures, 2018 IEEE Big Data Conferenc
Characterization the regulation of herpesvirus miRNAs from the view of human protein interaction network
<p>Abstract</p> <p>Background</p> <p>miRNAs are a class of non-coding RNA molecules that play crucial roles in the regulation of virus-host interactions. The ever-increasing data of known viral miRNAs and human protein interaction network (PIN) has made it possible to study the targeting characteristics of viral miRNAs in the context of these networks.</p> <p>Results</p> <p>We performed topological analysis to explore the targeting propensities of herpesvirus miRNAs from the view of human PIN and found that (1) herpesvirus miRNAs significantly target more hubs, moreover, compared with non-hubs (non-bottlenecks), hubs (bottlenecks) are targeted by much more virus miRNAs and virus types. (2) There are significant differences in the degree and betweenness centrality between common and specific targets, specifically we observed a significant positive correlation between virus types targeting these nodes and the proportion of hubs, and (3) K-core and ER analysis determined that common targets are closer to the global PIN center. Compared with random conditions, the giant connected component (GCC) and the density of the sub-network formed by common targets have significantly higher values, indicating the module characteristic of these targets.</p> <p>Conclusions</p> <p>Herpesvirus miRNAs preferentially target hubs and bottlenecks. There are significant differences between common and specific targets. Moreover, common targets are more intensely connected and occupy the central part of the network. These results will help unravel the complex mechanism of herpesvirus-host interactions and may provide insight into the development of novel anti-herpesvirus drugs.</p
Adonis: Practical and Efficient Control Flow Recovery through OS-Level Traces
Control flow recovery is critical to promise the software quality, especially for large-scale software in production environment.
However, the efficiency of most current control flow recovery techniques is compromised due to their runtime overheads along with
deployment and development costs. To tackle this problem, we propose a novel solution, Adonis, which harnesses OS-level traces,
such as dynamic library calls and system call traces, to efficiently and safely recover control flows in practice. Adonis operates in
two steps: it first identifies the call-sites of trace entries, then it executes a pair-wise symbolic execution to recover valid execution
paths. This technique has several advantages. First, Adonis does not require the insertion of any probes into existing applications,
thereby minimizing runtime cost. Second, given that OS-level traces are hardware-independent, Adonis can be implemented across
various hardware configurations without the need for hardware-specific engineering efforts, thus reducing deployment cost. Third, as
Adonis is fully automated and does not depend on manually created logs, it circumvents additional development cost. We conducted an
evaluation of Adonis on representative desktop applications and real-world IoT applications. Adonis can faithfully recover the control
flow with 86.8% recall and 81.7% precision. Compared to the state-of-the-art log-based approach, Adonis can not only cover all the
execution paths recovered, but also recover 74.9% of statements that cannot be covered. In addition, the runtime cost of Adonis is
18.3× lower than the instrument-based approach; the analysis time and storage cost (indicative of the deployment cost) of Adonis is
50× smaller and 443× smaller than the hardware-based approach, respectively. To facilitate future replication and extension of this
work, we have made the code and data publicly available
- …